Software Prepromotion for Non-Uniform Cache Architecture
نویسندگان
چکیده
As a solution to growing global wire delay, nonuniform cache architecture (NUCA) has already been a trend in large cache designs. The access time of NUCA is determined by the distance between the cache bank containing the required data and the processor. Thus, one of the important NUCA researches focuses on how to place data to be used into cache banks close to the processor. This paper proposes software prepromotion technique, which prepromote data using prepromotion instructions as similar as software prefetching does. Besides the basic software prepromotion, this paper also proposes smart multihop software prepromotion (SMSP), very long software prepromotion (VLSP) and their combination technique. SMSP intelligently chooses cache banks which the prepromoted data most ideally suit to being moved into. And VLSP prepromote multiple data using one instruction. Finally, we evaluate our approaches by testing 7 kernel benchmarks on a full-system simulator. The basic software prepromotion gets an average improvement of 2.6893% in IPC. The SMSP improves IPC by 7.0928% averagely. And the VLSP gets an IPC improvement of 7.2194% averagely. Lastly, after combining the SMSP and VLSP, the average improvement in IPC achieves 11.8650%.
منابع مشابه
An Argument for Simple COMA
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture that incorporates the major strengths of several contemporary multiprocessor architectures while avoiding their most serious weaknesses. Speciically, our architecture design incorporates the automatic data migration and replication features of cache-only memory architectu...
متن کاملRapid Hardware Prototyping on Rpm-2: Methodology and Experience
Field-Programmable Gate Arrays is an emerging technology which promises easy hardware reconfigurability by software at low cost. Entire systems can be built in which some parts are programmable. Such systems implement various architectures. Each architecture prototype is a detailed hardware implementation of the architecture -including I/O-on which complex software systems can be ported. We hav...
متن کاملShared Memory Multiprocessor Architectures for Software IP Routers
In this paper, we propose new shared memory multiprocessor architectures and evaluate their performance for future Internet Protocol (IP) routers based on Symmetric Multi-Processor (SMP) and Cache Coherent Non-Uniform Memory Access (CC-NUMA) paradigms. We also propose a benchmark application suite, RouterBench, which consists of four categories of applications representing key functions on the ...
متن کاملPerformance Models for Electronic Structure Methods on Modern Computer Architectures
Electronic structure codes are computationally intensive scientific applications used to probe and elucidate chemical processes at an atomic level. Maximizing the performance of these applications on any given hardware platform is vital in order to facilitate larger and more accurate computations. An important part of this endeavor is the development of protocols for measuring performance, and ...
متن کاملHypercube Connectivity within ccNUMA Architecture
The Silicon Graphics Origin2000TM and Onyx2TM systems are structured to fit a customer’s specific applications and problem sizes. This is accomplished with the company’s ccNUMA (cache coherent non-uniform memory access) architecture, which can link multitudes of processors together in such a way that the number of interconnections scales with the growth of the system, avoiding the bandwidth lim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JSW
دوره 5 شماره
صفحات -
تاریخ انتشار 2010